A comparative study of metagenomics analysis pipelines at the species level
نویسندگان
چکیده
9 Many metagenomics classification tools have been developed with the rapid growth of the 10 metagenomics field. However, the classification of closely related species remains a challenge for 11 this field. Here, we compared MetaPhlAn2, kallisto and Kraken for their performances in two 12 metagenomics settings, human metagenomics and environmental metagenomics. Our comparative 13 study showed that kallisto demonstrated higher sensitivity than MetaPhlAn2 and Kraken and better 14 quantification accuracy than Kraken at the species level. We also showed that classification tools 15 that run on full reference genomes misidentified many species that were not truly present. In order 16 to reduce false positives, we introduced marker genes from MetaPhlAn2 into our pipeline, which 17 uses kallisto for the classification step, as an additional filtering step for species detection. 18 19 Introduction 20 The advent of shotgun metagenomic sequencing greatly facilitated the identification and 21 classification of microbes by providing a means to detect phenotypically aberrant or unculturable 22 microbes [1]. It allows a much faster and cheaper taxonomic profiling of microbial communities 23 in different ecosystems such as the microbiome in human, soil and ocean. With the rising use of 24 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/081141 doi: bioRxiv preprint first posted online Oct. 15, 2016;
منابع مشابه
Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملComparative Metagenomic Analysis of Human Gut Microbiome Composition Using Two Different Bioinformatic Pipelines
Technological advances in next-generation sequencing-based approaches have greatly impacted the analysis of microbial community composition. In particular, 16S rRNA-based methods have been widely used to analyze the whole set of bacteria present in a target environment. As a consequence, several specific bioinformatic pipelines have been developed to manage these data. MetaGenome Rapid Annotati...
متن کاملMixture models for analysis of the taxonomic composition of metagenomes
MOTIVATION Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complic...
متن کاملAssessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics
Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of...
متن کاملMetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies
Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few s...
متن کامل